A New Approach for Automatic Chinese Spelling Correction
نویسنده
چکیده
This article presents a new approach for automatic Chinese spelling error detection and correction. Existing Chinese spelling checking systems have two problems: (1) low precision rate, and (2) lack of correction capability. The proposed Chinese spelling correction method is composed of two mechanisms (1) composite confusing character substitution, and (2) advanced word class bigram language model. The characters in the input sentence are rst substituted by their corresponding composite confusing character sets one by one. A composite confusing set is the collection of similar characters to a Chinese character from multiple views of shape, pronunciation, meaning , and input keystroke sequence. The substitution step produces several sentence hypotheses for the input sentence. Then, an advanced word class bigram language model, such as inter-word character bigram (IWCB) or SA-class bigram can be used for scoring each sentence hypothesis. Finally, the best scored sentence hypothesis is compared with the input sentence to determine the typos and their corrections. Experiments show that the proposed approach is very eeec-tive for dealing with the two mentioned problems.
منابع مشابه
A Unified Approach to Transliteration-based Text Input with Online Spelling Correction
This paper presents an integrated, end-to-end approach to online spelling correction for text input. Online spelling correction refers to the spelling correction as you type, as opposed to post-editing. The online scenario is particularly important for languages that routinely use transliteration-based text input methods, such as Chinese and Japanese, because the desired target characters canno...
متن کاملA New Statistical Approach To Chinese Pinyin Input
Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which ...
متن کاملAutomatic Rule Acquisition for Spelling Correction
This paper describes a new approach to automatically learning linguistic knowledge for spelling correction. A major feature of this approach is the fact that the acquired knowledge is captured in a small set of easily understood rules, as opposed to a large set of opaque features and weights. A perspicuous representation is advantageous in order to best exploit human intuition to understand and...
متن کاملExtended HMM and Ranking Models for Chinese Spelling Correction
Spelling correction has been studied for many decades, which can be classified into two categories: (1) regular text spelling correction, (2) query spelling correction. Although the two tasks share many common techniques, they have different concerns. This paper presents our work on the CLP-2014 bake-off. The task focuses on spelling checking on foreigner Chinese essays. Compared to online sear...
متن کاملارائه یک رتبهبند برای خطایاب معنایی با استفاده از ویژگیهای حساس به متن
Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007